18 research outputs found

    The Analysis of Student Traces for Q-Matrix Refinement and Knowledge Tracing

    Get PDF
    Nous assistons à une effervescense de l’auto-apprentissage rendue possible par l’Internet et les environnements d’apprentissage. L’accessibilité des MOOCS et des environnements d’apprentissage informatisés en est une manifestation. En contrepartie, l’apprenant perd le guidage personnalisé d’un tuteur humain et le développement d’environnements d’apprentissage adaptatifs vise à combler cette lacune. Afin d’offrir guidage et personnalisation au long du processus d’apprentissage, il est essentiel de bien évaluer les connaissances acquises de l’apprenant et d’adapter le matériel didactique en conséquence. Les recherches dans les domaines des tutoriels intelligents et de l’analytique des données éducationnelles visent essentiellement à développer des modèles de connaissances pouvant offrir le support à la personnalisation de l’apprentissage. Je propose dans cette thèse de nouvelles approches à la modélisation des connaissances apprenants autour de deux axes. Le premier porte sur l’objectif de valider les connaissances et compétences sous-jacentes à des tâches à partir de données. La classification de questions et exercices en une taxonomie d’objectifs d’apprentissage est un exemple pratique d’identification de compétences sousjacentes que les enseignants et pédagogues font couramment. Les chercheurs du domaine de la modélisation cognitive (Cognitive Diagnostic Modeling) vont plus loin en identifiant plusieurs connaissances et compétences derrière un seul problème à résoudre par exemple. Cet exercice est intrinsèquemet difficile et sujet aux erreurs. Les recherches pour faciliter la validation des connaissances sous-jacentes sont connues sous le nom du raffinement d’une Q-matrice qui représente l’alignement des tâches aux connaissances requises. La dernière décennie a été témoin de développements importants autour des approches basées sur les données pour effectuer le raffinement de Q-matrices. Ce processus de raffinement peut être considéré comme un problème de classification : pour chaque alignement tâche-connaissance défini par l’expert, l’algorithme de classification doit décider s’il est correct ou incorrect. Alors que la majorité des algorithmes portent sur une décision par alignement individuel, nous proposons une approche de classification basée sur des algorithme multi-classe où l’ensemble des connaissances requises par une tâche est soumises, plutôt que chaque connaissance individuellement. Les résultats de l’approche démontrent que le raffinement est généralement de meilleur qualité que les techniques de l’état de l’art. Le second axe vise à améliorer les modèles d’apprentissage profond pour l’évaluation des connaissances de l’apprenant à partir de traces séquentielles du succès ou échecs aux tâches. Nous tablons sur un modèle d’évaluation de connaissances capable de capturer l’évolution temporelle du profil de connaissance qui évolue au long du processus d’apprentissage de l’apprenant. Les algorithmes d’apprentissage profond utilisant une architecture LSTM (Long Short-Term Memory) aspirent à cet objectif de mémoriser les informations temporelles et réussissent effectivement à mieux prédire les performances des apprenants. Mais le profil de connaissance constitue un mécanisme plus explicite de l’état de connaissance atteint et plus efficace pour synthétiser cet état. Nous intégrons donc ce mécanisme à une architecture LSTM et à une architecture de réseau de mémoires (memory networks) afin de valider cette hypothèse. Le profil de connaissance est modélisé sous forme de classes et cette information est encodée par un vecteur binaire de longueur unitaire (one-hot) qui est fourni en entrée aux modèles d’apprentissage profond.----------ABSTRACT: The growth of self-learning, enabled by the availability on the Internet of different forms of didactic material such as MOOCs and tutoring systems, increases in turn the relevance of personalized instructions for students in adaptive learning environment. For providing adaptive and personalized learning instructions, the assessment of student’s mastery of a topic and the estimation of when she actually knows how to answer problems correctly is recognized as paramount in the fields of learning analytics and educational data mining community. In this dissertation, I propose novel approaches for building skills and student learning models along two axes. The first axis is to recover and ensure the quality of skills sets behind problems in learning system. The second axis is on improving the predictive accuracy of students’ performance based on student ability profile on skills and considering of difficulty of the problem dynamically. Both of these axes are complementary and essential in knowledge assessment of future educational learning systems for equipping intelligent agents to provide adaptive instructions and independent learning environment for students. The first axis is referred to as the Q-matrix refinement problem and consists in validating an expert-defined mapping of exercises and tasks to underlying skills. The last decade has witnessed a wealth of data driven approaches aiming to refine expert-defined mappings. This refinement can be seen as a classification problem: for each possible mapping of task to skill, the classifier has to decide whether the expert’s advice is correct, or incorrect. Whereas most algorithms are working at the level of individual mappings, we introduce an approach based on a multi-label classification algorithm that is trained on the mapping of a task to all skills simultaneously. This approach improves Q-matrix validation methods by using supervised multi-label classifier. Results show it outperforms the existing Q-matrix refinement techniques. The second axis aims to improve deep learning models of skills assessment based on sequential data. The student skills model needs to capture the temporal nature of student knowledge, changing over time, based on the learning transferred from previous practice. Deep learning has achieved a large amount of success in student performance prediction with models relying on Long short-term memory (LSTM). We proposed two approaches called Deep Knowledge Tracing and Dynamic Student Classification (DKT-DSC) and Dynamic Student Classification on Memory Networks (DSCMN) based on LSTM and key-value memory networks. We apply k-means clustering to capture students’ temporal ability profile at each time interval, which serves as a transfer learning mechanism across student’s long-term learning process. DKT DSC can capture temporal ability profile, utilize ability profile in assessment of knowledge mastery state simultaneously. The second approach, DSCMN, utilizes problem difficulty in prediction of student performance. According to experimental results, these approaches show improvements in student performance prediction over other state-of-the-art methods (such as BKT, PFA, etc.)

    Fast & Efficient Learning of Bayesian Networks from Data: Knowledge Discovery and Causality

    Full text link
    Structure learning is essential for Bayesian networks (BNs) as it uncovers causal relationships, and enables knowledge discovery, predictions, inferences, and decision-making under uncertainty. Two novel algorithms, FSBN and SSBN, based on the PC algorithm, employ local search strategy and conditional independence tests to learn the causal network structure from data. They incorporate d-separation to infer additional topology information, prioritize conditioning sets, and terminate the search immediately and efficiently. FSBN achieves up to 52% computation cost reduction, while SSBN surpasses it with a remarkable 72% reduction for a 200-node network. SSBN demonstrates further efficiency gains due to its intelligent strategy. Experimental studies show that both algorithms match the induction quality of the PC algorithm while significantly reducing computation costs. This enables them to offer interpretability and adaptability while reducing the computational burden, making them valuable for various applications in big data analytics

    Privacy-Preserving Synthetic Educational Data Generation

    Get PDF
    International audienceInstitutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets

    Interpretable Knowledge Tracing: Simple and Efficient Student Modeling with Causal Relations

    Get PDF
    International audienceIntelligent Tutoring Systems have become critically important in future learning environments. Knowledge Tracing (KT) is a crucial part of that system. It is about inferring the skill mastery of students and predicting their performance to adjust the curriculum accordingly. Deep Learning-based KT models have shown significant predictive performance compared with traditional models. However, it is difficult to extract psychologically meaningful explanations from the tens of thousands of parameters in neural networks, that would relate to cognitive theory. There are several ways to achieve high accuracy in student performance prediction but diagnostic and prognostic reasoning are more critical in learning sciences. Since KT problem has few observable features (problem ID and student's correctness at each practice), we extract meaningful latent features from students' response data by using machine learning and data mining techniques. In this work, we present Interpretable Knowledge Tracing (IKT), a simple model that relies on three meaningful latent features: individual skill mastery, ability profile (learning transfer across skills) and problem difficulty. IKT's prediction of future student performance is made using a Tree-Augmented Naive Bayes Classifier (TAN), therefore its predictions are easier to explain than deep learning-based student models. IKT also shows better student performance prediction than deep learning-based student models without requiring a huge amount of parameters. We conduct ablation studies on each feature to examine their contribution to student performance prediction. Thus, IKT has great potential for providing adaptive and personalized instructions with causal reasoning in real-world educational systems

    Dynamic student classification on memory networks for knowledge tracing

    Get PDF
    Pinnacle lab for analytics at Singapore Management Universit

    Relationship Between Serum Testosterone, Leptin,Interleukin-6 (il-6) Level and Insulin Sensitivity in Non-obese and Obese Male Subjects in Magway Region, Myanmar

    Get PDF
    Objective: To determine the relationship between insulin resistance and related variables (serum testosterone, interleukin (IL-6) and leptin level) in obese and non-obese healthy subjects. Methods: Community-based crosssectional, analytic study was undertaken in 60 subjects for each obese group (BMI ≥ 30.0 kg/m2) and non-obese group (BMI 18.5 to 24.9 kg/m2) (age;18-45 years) residing in Magway Township from December 2016 to December 2017. Serum insulin, testosterone, IL-6 and leptin levels were measured by enzyme linked immunoassay, and serum fasting glucose was measured by glucose oxidase method. Insulin sensitivity was calculated by HOMA formula (Homeostatic Model Assessment). Results:HOMA-IR, serum leptin and IL-6 level were significantly higher in obese group while serum testosterone level was significantly lower in obese group. There was a significantly correlation between HOMA-IR with leptin (r=0.306, p=0.001), IL-6 (r=0.237, p=0.009) and testosterone (r=-0.209,p=0.02). Moreover, serum leptin was significantly and positively correlated with IL-6 (r=0.391, p<0.001) while serum testosterone was significantly and negatively correlated with leptin (r=-0.408, p<0.001), and IL-6 (r=-0.34, p<0.001).Conclusions:Obese men are more likely to have low testosterone, high inflammatory markers leptin and Il-6, which were associated with decreased insulin sensitivity.

    AI-assisted knowledge assessment techniques for adaptive learning environments

    Get PDF
    International audienceThe growth of online learning, enabled by the availability on the Internet of different forms of didactic materials such as MOOCs and Intelligent Tutoring Systems (ITS), in turn, increases the relevance of personalized instructions for students in an adaptive learning environment. There are increasing interests as well as many challenges in the application of Artificial Intelligence (AI) techniques in educational settings to provide adaptive learning content to learners. Knowledge assessment is necessary for providing an adaptive learning environment. A student model serves as a fundamental building block of knowledge assessment in an adaptive learning environment. This paper intends to review the development of dominant families of student models with psychometric theory in early educational research, recent adaptations, and advances with machine learning and deep learning techniques. Our review covers not only the important families of student models but also why they were invented from both theoretical and practical viewpoints with AI and educational perspectives. We believe that the discussion covered in this review will be a valuable reference of introductory insights to AI for educational researchers, as well as an endeavor of introducing basic psychometric perspectives to AI experts for knowledge assessment in the field of learning science. Finally, we provide recent challenges and some potential directions for developing efficient knowledge assessment techniques in future adaptive learning ecosystems

    Privacy-Preserving Synthetic Educational Data Generation

    No full text
    International audienceInstitutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets
    corecore